996 research outputs found

    Qualitative Effects of Knowledge Rules in Probabilistic Data Integration

    Get PDF
    One of the problems in data integration is data overlap: the fact that different data sources have data on the same real world entities. Much development time in data integration projects is devoted to entity resolution. Often advanced similarity measurement techniques are used to remove semantic duplicates from the integration result or solve other semantic conflicts, but it proofs impossible to get rid of all semantic problems in data integration. An often-used rule of thumb states that about 90% of the development effort is devoted to solving the remaining 10% hard cases. In an attempt to significantly decrease human effort at data integration time, we have proposed an approach that stores any remaining semantic uncertainty and conflicts in a probabilistic database enabling it to already be meaningfully used. The main development effort in our approach is devoted to defining and tuning knowledge rules and thresholds. Rules and thresholds directly impact the size and quality of the integration result. We measure integration quality indirectly by measuring the quality of answers to queries on the integrated data set in an information retrieval-like way. The main contribution of this report is an experimental investigation of the effects and sensitivity of rule definition and threshold tuning on the integration quality. This proves that our approach indeed reduces development effort — and not merely shifts the effort to rule definition and threshold tuning — by showing that setting rough safe thresholds and defining only a few rules suffices to produce a ‘good enough’ integration that can be meaningfully used

    Quality Measures in Uncertain Data Management

    Get PDF
    Many applications deal with data that is uncertain. Some examples are applications dealing with sensor information, data integration applications and healthcare applications. Instead of these applications having to deal with the uncertainty, it should be the responsibility of the DBMS to manage all data including uncertain data. Several projects do research on this topic. In this paper, we introduce four measures to be used to assess and compare important characteristics of data and systems

    User Feedback in Probabilistic XML

    Get PDF
    Data integration is a challenging problem in many application areas. Approaches mostly attempt to resolve semantic uncertainty and conflicts between information sources as part of the data integration process. In some application areas, this is impractical or even prohibitive, for example, in an ambient environment where devices on an ad hoc basis have to exchange information autonomously. We have proposed a probabilistic XML approach that allows data integration without user involvement by storing semantic uncertainty and conflicts in the integrated XML data. As a\ud consequence, the integrated information source represents\ud all possible appearances of objects in the real world, the\ud so-called possible worlds.\ud \ud In this paper, we show how user feedback on query results\ud can resolve semantic uncertainty and conflicts in the\ud integrated data. Hence, user involvement is effectively postponed to query time, when a user is already interacting actively with the system. The technique relates positive and\ud negative statements on query answers to the possible worlds\ud of the information source thereby either reinforcing, penalizing, or eliminating possible worlds. We show that after repeated user feedback, an integrated information source better resembles the real world and may converge towards a non-probabilistic information source

    Optimally Controlled Field-Free Orientation of the Kicked Molecule

    Full text link
    Efficient and long-lived field-free molecular orientation is achieved using only two kicks appropriately delayed in time. The understanding of the mechanism rests upon a molecular target state providing the best efficiency versus persistence compromise. An optimal control scheme is referred to for fixing the free parameters (amplitudes and the time delay between them). The limited number of kicks, the robustness and the transposability to different molecular systems advocate in favor of the process, when considering its experimental feasibility.Comment: 5 pages, 2 figures (version 2 contains some minor additions and corrects many misprints

    Automated Problem Decomposition for the Boolean Domain with Genetic Programming

    Get PDF
    Researchers have been interested in exploring the regularities and modularity of the problem space in genetic programming (GP) with the aim of decomposing the original problem into several smaller subproblems. The main motivation is to allow GP to deal with more complex problems. Most previous works on modularity in GP emphasise the structure of modules used to encapsulate code and/or promote code reuse, instead of in the decomposition of the original problem. In this paper we propose a problem decomposition strategy that allows the use of a GP search to find solutions for subproblems and combine the individual solutions into the complete solution to the problem

    How Noisy Data Affects Geometric Semantic Genetic Programming

    Full text link
    Noise is a consequence of acquiring and pre-processing data from the environment, and shows fluctuations from different sources---e.g., from sensors, signal processing technology or even human error. As a machine learning technique, Genetic Programming (GP) is not immune to this problem, which the field has frequently addressed. Recently, Geometric Semantic Genetic Programming (GSGP), a semantic-aware branch of GP, has shown robustness and high generalization capability. Researchers believe these characteristics may be associated with a lower sensibility to noisy data. However, there is no systematic study on this matter. This paper performs a deep analysis of the GSGP performance over the presence of noise. Using 15 synthetic datasets where noise can be controlled, we added different ratios of noise to the data and compared the results obtained with those of a canonical GP. The results show that, as we increase the percentage of noisy instances, the generalization performance degradation is more pronounced in GSGP than GP. However, in general, GSGP is more robust to noise than GP in the presence of up to 10% of noise, and presents no statistical difference for values higher than that in the test bed.Comment: 8 pages, In proceedings of Genetic and Evolutionary Computation Conference (GECCO 2017), Berlin, German

    The strength of weak bots

    Get PDF
    Some fear that social bots, automated accounts on online social networks, propagate falsehoods that can harm public opinion formation and democratic decision-making. Empirical research, however, resulted in puzzling findings. On the one hand, the content emitted by bots tends to spread very quickly in the networks. On the other hand, it turned out that bots’ ability to contact human users tends to be very limited. Here we analyze an agent-based model of social influence in networks explaining this inconsistency. We show that bots may be successful in spreading falsehoods not despite their limited direct impact on human users, but because of this limitation. Our model suggests that bots with limited direct impact on humans may be more and not less effective in spreading their views in the social network, because their direct contacts keep exerting influence on users that the bot does not reach directly. Highly active and well-connected bots, in contrast, may have a strong impact on their direct contacts, but these contacts grow too dissimilar from their network neighbors to further spread the bot\u27s content. To demonstrate this effect, we included bots in Axelrod\u27s seminal model of the dissemination of cultures and conducted simulation experiments demonstrating the strength of weak bots. A series of sensitivity analyses show that the finding is robust, in particular when the model is tailored to the context of online social networks. We discuss implications for future empirical research and developers of approaches to detect bots and misinformatio

    The strength of weak bots

    Get PDF
    Some fear that social bots, automated accounts on online social networks, propagate falsehoods that can harm public opinion formation and democratic decision-making. Empirical research, however, resulted in puzzling findings. On the one hand, the content emitted by bots tends to spread very quickly in the networks. On the other hand, it turned out that bots’ ability to contact human users tends to be very limited. Here we analyze an agent-based model of social influence in networks explaining this inconsistency. We show that bots may be successful in spreading falsehoods not despite their limited direct impact on human users, but because of this limitation. Our model suggests that bots with limited direct impact on humans may be more and not less effective in spreading their views in the social network, because their direct contacts keep exerting influence on users that the bot does not reach directly. Highly active and well-connected bots, in contrast, may have a strong impact on their direct contacts, but these contacts grow too dissimilar from their network neighbors to further spread the bot\u27s content. To demonstrate this effect, we included bots in Axelrod\u27s seminal model of the dissemination of cultures and conducted simulation experiments demonstrating the strength of weak bots. A series of sensitivity analyses show that the finding is robust, in particular when the model is tailored to the context of online social networks. We discuss implications for future empirical research and developers of approaches to detect bots and misinformatio

    The SHAW experience in Indonesia: The multi-stakeholder approach to sustainable sanitation and hygiene

    Get PDF
    Caring for sanitation is the basis of healthy living for all in the community, but in general it is an individual responsibility. From the late 1990s, the CLTS (Community Led Total Sanitation) approach showed that people can become aware of why sanitation is important for them and their communit. They can be triggered to accomplish non-subsidized actions towards an Open Defecation Free (ODF) environment. The Community Based Total Sanitation (STBM) strategy was initiated in 2008 by the Government of Indonesia (GOI). It is a “total sanitation and hygiene” approach as a next generation to CLTS. The Sanitation, Hygiene and Water (SHAW) Programme is the first in Indonesia to apply STBM at large scale and had to develop an approach to make it work. After 4 years, we reached nearly 1 million people, and a new generation of issues comes up and needs to be solved

    Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS

    Get PDF
    Trio is a new kind of database system that supports data, uncertainty, and lineage in a fully integrated manner. The first Trio prototype, dubbed Trio-One, is built on top of a conventional DBMS using data and query translation techniques together with a small number of stored procedures. This paper describes Trio-One's translation scheme and system architecture, showing how it efficiently and easily supports the Trio data model and query language
    corecore